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1  SCOPE 


This  section  identifies  the  Ada  Compiler  Evaluation  Capability  (ACEC)  Release  2.0  product, 
states  its  purpose,  and  summarizes  the  purpose  and  contents  of  this  Final  Technical  Report. 

1.1  IDENTIFICATION 

This  is  the  Final  Technical  Report  for  the  ACEC  Release  2.0.  It  was  developed  by  Boeing's 
ACEC  group  under  contract  to  the  Wright  Research  and  Development  Center. 

The  first  release  of  the  ACEC  program  developed  a  Software  Product  consisting  of  a  suite 
of  benchmark  test  programs,  support  tools,  a  Read?'  s  Guide,  a  User's  Guide,  and  a  Version 
Description  Document  (VDDV  The  second  release  of  the  ACEC  corrected  problems  found  in 
the  first  release;  added  approximately  300  new  performance  tests,  assessors  for  program  library 
systems,  symbolic  debuggers,  and  system  diagnostics,  a  new  tool  to  simplify  the  preparation 
of  input  to  MEDIAN,  and  a  Single  System  Analysis  (SSA)  tool;  and  upgraded  supporting 
documentation  to  permit  a  user  to  assess  the  performance  of  Ada  compilation  systems. 

1.2  PURPOSE 

The  purpose  of  this  document  is  to  review  the  problems  encountered  and  the  lessons  learned 
in  the  process  of  developing  the  second  release  of  the  ACEC  Software  Product. 

The  descriptions  on  how  to  use  the  product  have  been  presented  in  the  Guides.  Information 
contained  in  the  Guides  will  not  be  repeated  here. 

The  numeric  results  of  the  ACEC  Release  2.0  testing  are  presented  in  the  ACEC  Software 
Test  Report  for  Release  2.0  and  will  not  be  repeated  here. 

1.3  INTRODUCTION 

The  ACEC  group  in  Boeing  Wichita's  Engineering  Software  and  Languages  Department  tested 
the  ACEC  Release  2.0  in  November  -  December  1989.  The  ACEC  was  tested  on  4  of  the  same 
5  trial  compilation  systems  (but  with  upgrades)  that  were  used  in  the  first  release  of  the  ACEC; 
DEC.  TeleSoft,  1750A  cross  compiler  (these  first  3  were  VAX  hosted),  and  the  Apollo.  The 
Silicon  Graphics  was  used  this  time  in  place  of  the  Harris  compilation  system. 

The  ACEC  Release  2  0  Software  Product  consists  of: 

•  A  suite  of  performance  test  programs 

•  A  set  of  supporting  packages 

These  include  programs  to  test  the  accuracy  of  elementary  math  functions,  programs  to 
test  the  accuracy  of  the  system  clock,  programs  to  include  the  timing  loop  in  the  test 
programs,  and  a  portable  implementation  of  a  math  library. 
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•  A  set  of  analysis  tools 

These  include  FORMAT,  MED.DATA.CONSTRUCTOR,  MEDIAN  and  Single  System 
Analysis  (SSA) 

•  A  set  of  assessors  for  diagnostics,  symbolic  debuggers,  and  program  library  systems 

•  Sample  command  files  to  compile  and  execute  the  tools  and  tests  for  a  VMS  and  for  a 
UNIX  based  system  which  can  be  used  as  guides  for  porting  to  other  systems 

This  report  assumes  that  the  reader  is  familiar  with  the  ACEC  Release  2.0  User  s  Guide 
the  Reader’s  Guide,  and  the  Version  Description  Document  (VDD).  Individual  test  problems 
are  referred  to  by  name  (in  capitals)  in  this  report.  The  reader  should  refer  to  the  VDD  Release 
2.0  for  a  description  of  the  problem,  information  from  the  Software  Test  Report  Release  2.0  is 
also  referred  to. 

This  document  reviews  the  development  of  the  second  release  of  the  test  suite  and  sup¬ 
porting  tools.  It  gives  an  analysis  of  results  and  a  review  of  the  lessons  learned. 


6 


2  APPLICABLE  DOCUMENTS 

The  following  documents  are  referenced  in  this  guide. 


2.1  GOVERNMENT  DOCUMENTS 

MIL-STD-1815A  Reference  Manual  for  the  Ada  Programming  Language  (LRM) 


2.2  NON-GOVERNMENT  DOCUMENTS 


D500- 12470-1 


D500-12471-1 


D500-12472-1 


D500-12467-1 


Ada  Compiler  Evaluation  Capability  (ACEC) 

Technical  Operating  Report  (TOR) 

User's  Guide  Release  2.0 
Boeing  Military  Airplanes 
P.O.  Box  7730 
Wichita,  Kansas 

Ada  Compiler  Evaluation  Capability  (ACEC) 

Technical  Operating  Report  (TOR) 

Reader's  Guide  Release  2.0 
Boeing  Military  Airplanes 

Ada  Compiler  Evaluation  Capability  (ACEC) 

Version  Description  Document  (VDD)  Release  2.0 
Boeing  Military  Airplanes 

Ada  Compiler  Evaluation  Capability  (ACEC) 

Software  Test  Report  Release  2.0 
Boeing  Military  Airplanes 

Understanding  Robust  and  Exploration  Data  Analysis, 
by  Hoaglin,  Mosteller,  and  Tukey 
John  Wiley  &  Sons,  Inc,  1983 
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3  ANALYSIS  OF  THE  TEST  SUITE 


The  numeric  values  obtained  by  executing  the  second  release  of  the  test  suite  were  presented 
in  the  Software  Test  Report  Release  2.0  and  will  not  be  duplicated  here. 

The  following  table  displays  the  test  problems  where  one  of  the  quartiles  was  flagged 
by  MEDIAN  as  exceptional.  Such  highly  variable  test  problems  might  suggest  cases  where 
systems  are  not  performing  comparable  operations.  These  lines  were  extracted  verbatim  from 
the  MEDIAN  report. 

TEST  PROBLEM 


NAME 

1  MIN 

LOWER 

MEDIAN 

UPPER 

MAX 

1 

SPREAD 

1016 

1  0.27- 

0.34- 

1.00 

2.90+ 

3.81  + 

1 

8.41 

1020 

1  0.27- 

0.23- 

1.00 

3.58  + 

3.67  + 

1 

12.79 

1024 

1  0 . 15< 

0.25- 

1.00 

3.96+ 

22 . 48> 

1 

15.71 

1025 

1  0 . 04< 

0.25- 

1.00 

3.93+ 

4.59+ 

1 

15.45 

1026 

1  0.29- 

0.30- 

1.00 

3.33  + 

26 . 45> 

1 

11.10 

1027 

1  0.21- 

0.29- 

1.00 

3.51  + 

25 . 28> 

1 

12.30 

1028 

1  0 . 07< 

0.37- 

1.00 

2.68+ 

6 . 59> 

1 

7.17 

1030 

1  0 . 07< 

0.38- 

1.00 

2.66+ 

6 . 57> 

1 

7 . 10 

10_80_20_ 1 

1  0.20- 

0.28- 

1.00 

3.63+ 

4.01  + 

1 

13.17 

10,80. 20.2 

1  0 . 12< 

0. 17< 

1 .00 

5 . 90> 

6 . 37> 

1 

34.80 

10,80,20.3 

1  0 . 09< 

0.33- 

1.00 

3.04  + 

3.57+ 

1 

9.22 

I0.INTER3 

1  0.10< 

0. 13< 

1.00 

7 . 73> 

11 ,20> 

1 

59.82 

I0.PATTERN2 

1  0 . 07< 

0. 15< 

1 .00 

6 . 67> 

9 . 51> 

1 

44.49 

I0.PATTERN3 

1  0 . 07< 

0. 16< 

1.00 

6. 13> 

8 . 80> 

1 

37.52 

I0.PATTERN4 

1  0 . 08< 

0. 17< 

1.00 

5 . 73> 

9 . 48> 

1 

32.87 

I0.PATTERN6 

1  0 . 05< 

0. 12< 

1.00 

8 . 21> 

13 . 52> 

1 

67.45 

I0.PATTERN7 

1  0 . 05< 

0. 13< 

1.00 

7 . 74> 

12 . 04> 

1 

59.93 

I0.PATTERN8 

1  0 . 06< 

0. 13< 

1.00 

7 . 42> 

12 . 81> 

1 

54.99 

I0.RECUR1 

1  0 . 12< 

0.21- 

1  .00 

4.72+ 

6 . 68> 

1 

22.27 

I0.RECUR2 

1  0.21- 

0.28- 

1.00 

3.62  + 

9 . 24> 

1 

13.13 

I0.RECUR3 

1  0.27- 

0.35- 

1  .00 

2.84  + 

7. 19> 

1 

8.06 

I0.SCAN4 

I  0 . 07< 

0. 16< 

1.00 

6 . 3 1  > 

10 . 35> 

1 

39.77 

I0.SCAN8 

1  0 . 05< 

0. 12< 

1.00 

8 . 49> 

14 . 77> 

1 

72.01 

SS153 

1  0 . 06< 

0.35- 

1.00 

1 .21 

4.95  + 

1 

3.41 

SS602 

1  0 . 06< 

0.37- 

1.00 

1.21 

5.12  + 

1 

3.22 

SS753 

1  0 . 09< 

0.35- 

1.00 

1.28 

5 . 61> 

1 

3.62 

SS754 

1  0 . 09< 

0.37- 

1.00 

1.25 

5 . 57> 

1 

3.35 

SS802 

1  0 . 03< 

0.24- 

1.00 

1  . 19 

1.43 

1 

5.04 
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The  following  list  describes  the  features  for  each  fiagged  test  problem  icsponsible  for  the 
performance  variations  as  identified  by  the  ACEC  group 

TEST  PROBLEM 

NAME  I  FEATURES 


1016 


1020 


1024 

1025 

1026 

1027 

1028 
1030 

10.80.20.1 

10.80.20.2 

10.80.20.3 

I0.INTER3 

I0.PATTERN2 

I0.PATTERN3 

TP  •  '»'Trn»!  a 

a  w  n  a  ;  k...n  1 

I0.PATTERN6 

I0.PATTERN7 

I0.PATTERN8 

TO. RECUR 1 

I0.RECUR2 

I0.RECUR3 

I0.SCAN4 

10. SCAN8 

SS 153 

SS602 

55753 

55754 
SS802 


I  References  SIZE  function  of  a  DIRECT  file.  Performance 
depends  on  underlying  operating  system.  Some  may  not 
require  disc  reference. 

I  References  END.0F.FILE  function  of  a  SEQUENTIAL  file. 

Performance  depends  on  underlying  operatine  system.  Some 
may  not  require  disc  reference. 

I  Console  1/0  device  dependent 
I  Console  1/0 
I  Console  1/0 
I  Console  1/0 
I  Console  1/0 
I  Console  1/0 

I  Direct  file  random  reads  from  a  small  set  of  records 

I  Direct  file  random  reads  from  a  small  set  of  records 

I  Direct  file  random  reads  from  a  small  set  of  records 

I  Direct  file  random  read/write  in  a  small  set  of  records 

I  Direct  file  random  reads  from  a  small  set  of  records 

I  Direct  file  random  reads  from  a  small  set  of  records 

!  Direct  fili  random  it—d'  from  a  small  set  of  records 

I  Direct  file  random  writes  to  a  small  set  of  records 

I  Direct  file  random  writes  to  a  small  set  of  records 

I  Direct  file  random  writes  to  a  small  set  of  records 

I  Direct  file  random  reads  from  a  small  set  of  records 

I  Direct  file  random  reads  from  a  small  set  of  records 

I  Direct  file  random  reads  from  a  small  set  of  records 

I  Direct  file  random  reads  from  a  small  set  of  records 

I  Direct  file  random  writes  to  a  small  set  of  records 

I  Raises  exceptions 

I  Raises  exceptions 

I  Raises  exceptions 

I  Raises  exceptions 

I  Call  on  CALENDAR. SECONDS 
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3.1  CONFIRMED  EXPECTATIONS 

The  following  subsections  l,st  design  decisions  which  worked  well 

3.1.1  MEDIAN 

If  comparing  multiple  systems,  some  type  of  statistical  analysis,  similar  to  MEDIAN,  is  necessary 
for  practical  use  of  a  test  suite  with  more  than  a  handful  of  test  problems  The  analysis  focuses 
the  attention  of  the  ACEC  users  on  the  test  problems  with  unusual  results  It  permits  a 
form  of  report-by-exception  where  ACEC  users  can  concentrate  their  effotts  on  exploring  the 
test  problems  with  anomalous  performance.  Since  most  test  problems  will  not  be  flagged  bv 
MEDIAN  as  outliers.  ACEC  users  will  be  able  to  "skim  over  most  of  them  and  concentrate 
on  those  where  large  differences  between  systems  were  observed 

Without  an  analysis  tool  a  test  suite  would  require  users  to  understand  each  of  the  test 
problems  at  least  to  know  if  a  result  or,  one  system  was  good.  bad.  or  indifferent  The  residual 
matrix  is  very  helpful  since  it  establishes  a  normalized  metric  for  test  results  —  a  residual  close 
fo  one  is  typical  "  No  analysis  tool  can  extract  more  information  from  a  set  of  data  than  is 
implicit  in  the  collection  of  raw  measurements,  but  it  can  make  the  relationships  between 
data  more  apparent  It  would  be  very  easy  to  overlook  the  fact  that  a  system  executes  some 
test  problems  twice  as  fast  as  typical  when  all  the  data  is  presented  as  one  large  table  of  timing 
measurements  It  is  important  to  prevent  users  from  being  overwhelmed  by  the  volume  of  data 
and  MEDIAN  serves  this  role 

3.1.2  Single  System  Analysis 

The  Single  System  Analysis  tool  provides  an  automated  way  of  comparing  and  displaying  results 
from  sets  of  related  performance  test  problems  The  ACEC  team  has  manually  compared  results 
of  related  tests  before  the  development  of  this  tool,  however,  the  speed  and  automated  nature 
of  the  tooi  now  make  it  much  simpler  to  do  l  his  ease  of  use  encourages  ACEC  us®'*  obtain 
the  full  benefits  of  collecting  the  performance  data 

The  Single  System  Analysis  tool  is  straightforward  in  principle  and  no  unanticipated  prob¬ 
lems  arose  during  its  development  and  use  other  than  on  the  Silicon  Graphics.  The  system 
appears  to  have  a  problem  with  exception  handling. 

3.1.3  Conventions  For  Valid  Portable  Test  Problems 

Writing  valid  portable  benchmarks  is  not  easy.  It  is  necessary  to  be  careful  in  constructing 
test  problems  so  that  they  are  not  unduly  optimizable  This  was  anticipated  and  confirmed 
past  experience  The  conventions  established  for  developing  tests  were  generally  effective  in 
precluding  unexpected  optimizations 
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*.i.l  INCLUDE 

INCLUDE  is  an  ACEC  support  tool  used  to  textuallv  expand  Ada  source  text  It  is  used  to 
insert  the  timing  loop  code  into  the  test  programs  Refer  to  the  User's  Guide  for  a  discussion 
on  the  use  of  this  tool 

The  decision  to  use  a  separate  tool  to  INCLUDE  the  timing  loop  code  has  worked  well  It 
has  permitted  flexible  modification  of  the  timing  code 

•  To  modify  the  basic  timing  loop  code  as  needed 

•  To  switch  between  (  entrai  Processing  Unit  (CPU)  time  and  elapsed  time  measurements 

•  To  accommodate  implementation  dependencies  (such  as  the  GETADR  calls  to  measure 
code  expansion  size  when  the  label  ADDRESS  attribute  did  not  work) 

t.l.a  Timing  Loop  (  'ode 

I  he  timing  loop  code  is  responsible  for  measuring  the  execution  time  and  code  expansion  for 
each  tost  problem  It  is  discussed  in  depth  in  the  Reader  s  Guide,  including  the  constraints  on 
■  t s  design  and  the  conventions  on  writing  test  problems  so  that  the  test  designer  can  measure 
what  was  intended 

T  he  timing  loop  code  was  enhanced  from  the  first  release  in  several  ways 

1  The  termination  condition  for  the  inner  timing  loop  has  incorporated  a  statistically  robust 
confidence  test  adapted  from  Understanding  Robust  and  Exploration  Data  Analysis,  by 
Hoaglin  Mosteller.  and  Tukev  An  unreliable  measurement  indicator  is  printed  when 
either  a  t  test  or  an  r-test  fails  —  the  outer  timing  loop  uses  the  t-test  to  terminate  with 
consistent  measurements  because  this  is  a  more  conservative  test 

This  has  been  a  valuable  change  because  it  reduces  the  number  of  test  problems  which 
are  unnecessarily  reported  as  unreliable  measurements 

2  Most  of  the  cod*3  associated  with  the  timing  loop  in  STOPTIMEO  has  been  moved  into 
a  procedure  in  GLOBAL 

This  is  valuable  because  it  reduces  the  size  of  the  generated  code  for  multiple  problem  test 
programs  and  reduces  compilation  time  It  saves  ninetv-fn/n  lines  of  rode  ppr  problem 

3  The  printing  of  time  measurements  has  been  modified  so  that  an  exponential  format  is 
used  for  values  less  than  ten  microseconds  or  greater  than  one  hundred  seconds. 

This  is  valuable  because  using  the  scheme  from  the  first  release,  some  of  the  small  test 
problems  on  fast  machines  would  be  truncated  to  one  significant  digit  when  printed, 
making  the  effort  expended  to  calculate  measurements  to  a  five  per  cent  confidence 
interval  unnecessarily  precise  The  processing  for  large  times  is  also  necessary  (some  of 
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the  I/O  test  problems  developed  for  the  second  release  product  execute  for  more  than 
100  seconds  on  some  of  the  trial  systems)  so  that  FORMAT  can  recognize  the  large 
times 

4  Sometimes  measurement  noise  makes  the  timing  loop  code  calculate  a  small  positive  time 
for  test  problems  which  have  been  optimized  into  a  null.  The  timing  loop  was  modified 
to  check  whether  the  code  expansion  size  measurements  indicate  that  the  test  problem 
has  beer,  optimized  into  a  null  and  to  print  a  zero  time  in  this  case 

5  In  the  first  release,  the  ACEC  timing  loop  reported. all  test  problems  with  uncorrected 
execution  times  less  than  the  null  loop  time  as  zero.  The  timing  loop  was  modified  to  test 
for  the  case  where  subtracting  the  null  loop  time  from  the  total  problem  execution  time 
to  calculate  the  effective  test  problem  execution  time  would  produce  a  negative  value 
which  is  larger  than  can  be  attributed  to  measurement  noise.  If  the  computed  negative 
time  is  small  (less  than  the  variations  in  the  null  loop  during  initialization  of  the  timing 
loop  m  executing  each  test  program),  it  is  reported  as  zero.  A  large  negative  value 
will  result  in  an  unreliable  measurement  error  code. 

This  change  was  valuable  because  it  tells  users  when  unusual  behavior  is  observed. 

6  The  timing  loop  was  modified  to  detect  when  a  null  statement  is  being  evaluated  and  to 
not  perform  the  maximum  number  of  iterations  in  this  case. 

The  benefit  of  this  change  is  that  it  saves  some  execution  time  when  running  test  problems 
optimized  into  null  statements. 

7  The  timing  loop  (INITTIME)  was  modified  to  explicitly  test  whether  the  system  has 
taken  special  steps  to  return  a  unique  value  for  CALEN DAR.TIM E  each  time  it  is  called 
Some  operating  systems  have  implemented  the  time  function  such  that  every  value  is 
guaranteed  to  have  a  unique  value  during  each  day  —  where  their  actual  clock  resolution 
is  not  as  accurate  as  would  be  necessary  to  satisfy  this  system  requirement  with  the 
clock  value,  they  simply  track  the  number  of  calls  on  the  TIME  function  since  the  actual 
dock  "ticked"  and  return  as  the  function  result  the  actual  time  plus  the  number  of  calls 
(having  ensured  that  on  their  fastest  system,  the  number  of  calls  which  can  be  made 
on  the  TIME  function  before  the  clock  "ticks  is  less  than  the  resolution  of  the  return 
tvpe)  On  these  systems,  t h *»  low  order  hits  are  essentially  iispIpss  as  indicators  of  actual 
time.  On  first  encountenng  this  beliavioi.  many  programmers  may  think  it  is  a  strange 
way  for  a  system  to  keep  time,  however  it  does  have  utility  —  some  low-level  programs 
are  simplified  if  they  can  assume  that  time-stamps  are  unique  and  form  a  monotonic 
sequence  It  is  typical  for  Ada  systems  to  "pass-through  the  operating  system  function 
for  the  TIME  function,  so  that  this  behavior  is  not  hidden  from  Ada  programs.  The 
first  released  ACEC  timing  loop  could  calculate  inaccurate  times  —  such  systems  render 
the  software  vernier  calibration  process  that  the  ACEC  uses  futile  The  second  release 
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of  the  ACEC  tests  on  initialization  for  this  behavior  and  if  detected,  uses  a  large  value 
for  MIN  JITTER  COMPENSATION  in  the  timing  loop  to  essentially  make  the  ACEC 
calculations  not  rely  on  the  vernier 

This  change  is  valuable  because  if  it  were  not  compensated  for,  the  ACEC  timing  loop 
could  produce  inaccurate  estimates  and  error  bounds  on  some  systems  for  fast  executing 
test  problems. 

)t.  1 .6  FORMAT-MEDIAN  Interface 

The  exec  ..tion  of  the  ACEC  test  suite  produces  a  log  file  containing  performance  measurements 
for  each  test  run.  The  log  file  must  be  processed  before  the  measurement  data  can  be  input  to 
MEDIAN  The  method  used  in  the  first  release  to  interface  the  results  file  and  the  MEDIAN 
analysis  tool  was  awkward  The  first  release  required  the  user  to  execute  FORMAT  to  extract  a 
data  aggregate,  use  a  text  editor  to  insert  the  aggregate  into  MED-DATA.  and  then  recompile 
MED. DATA  and  MEDIAN 

For  the  second  release,  the  MED  DATA  CONSTRUCTOR  tool  was  written  to  automate 
the  process  of  creating  a  MED  DATA  package  from  the  outputs  of  FORMAT  from  differ¬ 
ent  systems.  The  user  must  execute  FORMAT  to  create  data  aggregates  and  then  execute 
MED  DATA  CONSTRUCTOR  to  create  a  MED  DATA  package  containing  data  for  the  spec¬ 
ified  systems  MED  DATA  and  MEDIAN  are  then  compiled,  and  MEDIAN  is  executed.  The 
user  must  use  a  text  editor  to  create  a  file  containing  the  system  names  and  the  names  of  the 
data  files  to  be  used  by  MED-DATA. CONSTRUCTOR.  The  user  is  not  required  to  edit  the 
data  aggregates. 

Another  approach  would  have  been  to  combine  in  one  program  the  functions  of  FORMAT, 
MED-DATA. CONSTRUCTOR,  and  MEDIAN.  This  was  not  done  because  the  ACEC  interface 
must  deal  with  bare  machines  which  have  limited  I/O  capabilities. 

The  method  selected  permits  the  collection  and  processing  of  results  on  target  machines 
which  have  only  a  console  output  device  (that  is,  no  way  to  write  to  a  file  for  processing  by 
another  program).  This  constraint  limits  the  design  options  and  the  degree  of  "friendliness'' 
possible  —  the  ACEC  would  be  friendlier  if  each  test  program  updated  a  file  to  reflect  results, 
then  MEDIAN  (or  other  analysis  tools)  could  read  this  file. 

The  current  interface  provides  some  important  advantages. 

1.  The  order  in  which  a  user  executes  test  programs  is  not  important 

2.  Users  do  not  have  to  create  or  update  a  large  positional  array.  It  would  be  error  prone 
to  manipulate  an  aggregate  with  over  a  thousand  entries  —  the  risks  of  recording  results 
from  one  test  problem  as  coming  from  another  test  problem  would  be  large.  Aggregates 
with  named  associations  are  helpful  here. 
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3  It  is  a  flexible  system.  It  is  simple  for  a  user  to  incorporate  additional  test  programs  into 
the  suite,  as  long  as  they  avoid  naming  conflicts  with  existing  test  problems.  It  is  simple 
for  a  user  to  use  MEDIAN  to  analyze  subsets  of  the  data,  such  as  only  the  tasking  tests. 

.3.1.7  Guides 

The  writing  of  effective  user  documentation  is  difficult  and  time  consuming.  It  is  also  very 
important  to  the  usability  of  a  product,  particularly  when  there  is  no  provision  for  telephone 
hot-lines'  for  helping  users  who  run  into  problems. 

The  ACEC  user  documentation  is  extensive  (over  400  pages)  and  comprehensive. 

The  User  s  Guide  and  Reader’s  Guide  present  an  introduction  to  the  issues  involved  in 
measuring  performance  They  discuss: 

•  Compiler  optimization 

•  Aspects  of  machine  design  which  influence  performance 

•  Pitfalls  in  measuring  system  performance 

•  Statistics  of  collecting  and  analyzing  performance  data 

They  provide  helpful  background  information  to  ACEC  users  about  performance  evaluation  in 
general  and  Ada  in  particular.  They  also  provide  detailed  instructions  for  using  the  ACEC. 

The  ACEC  Version  Description  Document  (VDD)  contains  useful  information  about  the 
test  suite  itself  and  the  individual  test  problems. 

The  general  reaction  to  the  first  release  of  the  ACEC  documentation  was  favorable. 

3.1.8  Assessors 

Release  2  of  the  ACEC  includes  new  assessors  for  the  Ada  program  library  management  system, 
the  diagnostic  messages,  and  the  symbolic  debugger  The  assessors,  especially  the  debugger 
assessor,  were  found  to  be  more  time-consuming  to  execute  than  originally  predicted  This  i> 
due  to  two  factors: 

•  Adaptation  effort. 

The  assessors  test  a  wide  range  of  capabilities  and  a  large  number  of  individual  actions 
which  must  be  separately  adapted  to  each  system.  The  effort  required  is  significant,  even 
when  the  evaluator  is  familiar  with  the  product. 

The  program  library  assessor  includes  19  scenarios  with  80  separate  questions. 

The  diagnostic  message  assessor  includes  45  scenarios  with  455  separate  questions. 

The  symbolic  debugger  assessor  includes  29  scenarios  with  118  separate  questions. 
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•  Learning  time. 

There  was  a  learning  time  required  for  an  evaluator  to  become  familiar  with  a  system 
not  previously  used. 

In  order  to  perform  the  library  assessor  and  the  debugger  assessor,  the  evaluator  must 
learn  implementation  specific  program  library  and  debugger  commands  and  concepts. 

It  is  necessary  to  adapt  the  commands  used  to  execute  each  scenario.  The  assessors 
cover  a  broad  scope,  and  before  deciding  that  some  capability  was  not  supported,  it 
was  necessary  to  stop  and  review  manuals  to  make  sure  that  something  had  not  been 
overlooked  in  the  initial  learning  process. 

The  initial  projections  for  learning  time  were  based  on  the  assumption  that  the  trial 
systems  would  be  similar  to  the  system  the  assessors  were  developed  on.  This  was  overly 
optimistic. 

We  believe  that  ACEC  users  will  find  the  effort  worthwhile  because  it  will  provide  them 
with  more  organized  information  about  the  capabilities  of  systems  than  they  would  acquire 
with  a  similar  amount  of  time  spent  in  unstructured  experimentation.  Although  vendors 
can  tell  their  users  that  their  library  system  and/or  the  debugger  provide  all  the  useful 
capabilities,  running  the  ACEC  assessors  will  assure  the  testers  that  the  capabilities  exist 
and  that  they  know  how  to  use  them. 

3.1.9  Portable  Math 

The  second  release  of  the  ACEC  added  a  representation  independent  version  of  MATH-DEPENDENT, 
greatly  increasing  the  portability  of  the  ACEC  provided  generic  math  library  and  therefore,  sim¬ 
plifying  the  programmer  effort  required  to  produce  a  working  math  library. 

There  was  one  trial  system  which  provided  a  library  which  was  not  usable  for  an  interesting 
reason  That  system  named  the  provided  math  library  '’MATH''  —  the  same  name  ACEC  uses 
for  its  math  package.  This  presents  a  naming  conflict  which  could  not  be  resolved  without 
extensive  modifications  to  the  source  text  of  the  ACEC  test  programs  and  analysis  tools  or 
modifications  to  the  source  of  the  implementation  provided  math  library.  Since  neither  of  these 
alternatives  was  acceptable,  on  this  system  testing  proceeded  using  the  ACEC  provided  portable 
math  library. 

The  ACEC  math  package  specification  (and  all  tl-ip  programs  referencing  t|ie  specification ) 
were  changed  to  be  compatible  with  a  subset  of  the  Association  for  Computing  Machinery. 
Special  Interest  Group  on  Ada.  Numerics  Working  Group  (NUMWG)  recommendations.  This 
change  may  be  of  increasing  value  in  the  future  if  and  when  more  implementations  adapt  the 
NUMWG  recommendations.  Only  one  of  the  trial  systems  provided  a  NUMWG  implementation, 
and  on  this  trial  system  there  were  no  problems  with  using  the  vendor  provided  implementation 
in  place  of  the  ACEC  provided  math  package. 
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3.2  PROBLEMS  ENCOUNTERED 


During  the  development  of  the  ACEC,  the  test  suite  and  tools  were  executed  on  multiple 
systems  both  to  verify  portability  of  the  code,  and  to  provide  sample  data  to  demonstrate  the 
comparative  analysis  tools.  These  are  the  "trial"  systems  referred  to  throughout  this  report. 
During  this  process  some  implementation  errors  and  restrictions  were  discovered  in  the  trial 
systems,  as  detailed  below: 

•  Some  systems  did  not  support  the  label'ADDRESS  clause  (used  in  the  ACEC  to  measure 
code  expansion  size). 

•  Most  systems  do  not  support  tying  tasks  to  interrupts  (a  Chapter  13  feature). 

•  Most  systems  do  not  support  a  type  SMALL  specifying  a  fixed  point  delta  which  is  not 

a  power  of  two. 

•  Most  systems  did  not  support  asynchronous  I/O  operations  —  that  is.  an  I/O  operation 
in  a  task  causes  the  yvmjram  to  halt  until  the  I/O  completes. 

•  A  few  systems  failed  to  always  reclaim  implicitly  allocated  space.  This  prevented  some 
test  problems  from  completing.  The  LRM  does  not  require  that  the  space  be  reused,  so 
this  is  an  implementation  restriction  rather  than  an  error. 

•  Some  systems  restrict  the  type  of  files  that  can  be  processed  Test  problems  using  a 

SEQUENTIALJO  instantiated  with  an  unconstrained  type  (that  is,  STRING)  were  not 
accepted  by  many  systems,  as  were  problems  using  sequential  and  direct  files  with  a 
variant  record.  This  feature  is  not  supported  by  many  of  the  trial  systems.  On  the 

DEC  Ada  system  the  test  problems  would  not  run  using  the  default  FORM  strings  and 

a  system  dependent  adaptation  was  necessary. 

•  The  test  problem  (SS747)  which  calls  on  an  assembly  language  procedure  did  not  work 
on  all  systems.  In  several  cases,  the  trial  systems  claim  to  support  the  facility,  but  the 
provided  documentation  was  unclear,  and  the  initial  attempt  to  adapt  the  test  problem 
failed. 

•  Capacity  limitations  —  some  programs  were  too  large  to  be  compiled  by  some  of  the 
systems. 

•  The  implementation  dependent  type  SYSTEM  ADDRESS  is  converted  to  an  integer  type 
to  calculate  sizes.  The  size  of  this  integer  type  is  implementation  dependent  and  may 
need  to  be  adapted  for  use  on  different  systems. 

•  Some  implementations  imposed  restrictions  on  length  clauses  Several  only  supported 
the  declaration  of  specific  predefined  sizes.  For  example,  several  would  not  accept  a 
specification  for  a  three  bit  wide  field. 
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•  Preemptive  priority  scheduling  was  not  supported  on  all  the  trial  systems.  This  caused 
some  of  the  test  problems  to  report  a  runtime  error. 

•  Some  compilation  systems  did  not  support  integer  types  which  required  more  than  16 
bits.  This  restriction  prevented  some  test  programs  from  operating 

•  One  system  vfor  an  embedded  target)  did  not  support  any  file  I/O. 

•  Some  systems  did  not  support  an  option  to  specify  tasking  discipline  (time-sliced  vs  run- 
till- blocked ) .  This  is  not  required  but  tests  for  it  were  included  in  the  second  release. 

•  Particularly  with  the  Silicon  Graphics  system,  there  were  test  problems  which  were  mea¬ 
sured  as  taking  zero  time  but  were  not  NULL  statements  (they  had  positive  code  ex¬ 
pansion  sizes,  performed  non-optimizable  operations,  and  code  was  generated).  In  some 
cases,  when  the  test  programs  were  rerun,  the  spurious  zero  measurements  did  not  reoc¬ 
cur,  but  this  was  not  universal. 

Initial  investigation  is  inconclusive  and  more  testing  with  the  system  would  be  required 
to  isolate  the  source  of  the  problem,  which  might  be  applicable  to  other  compilation 
systems. 

Although  the  portability  of  the  test  suite  could  have  been  improved  by  restricting  it  to 
use  only  the  features  supported  by  the  most  limited  of  the  known  Ada  implementations,  this 
was  not  done.  It  is  important  to  evaluate  all  the  capabilities  of  the  language.  Some  of  the 
features  not  supported  by  all  systems  are  important  to  some  users  and  need  to  be  covered  in 
a  comprehensive  test  suite. 
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3.3  MEDIAN  ANALYSIS  LIMITATIONS 


One  potential  criticism  of  the  ACEC  product  is  that  the  system  factors  computed  by  MEDIAN 
will  not  reflect  the  performance  of  Ada  implementations  on  the  types  of  applications  all  users 
are  interested  in.  MEDIAN  computes  system  factors  by  averaging  over  the  test  problems.  If  the 
distribution  of  language  features  is  very  different  in  a  user's  application  than  the  distribution 
in  the  ACEC  test  suite,  the  system  factors  may  not  accurately  predict  performance  of  different 
systems  on  the  user's  ultimate  application.  For  example,  comparing  two  systems,  one  of  which 
provides  floating  point  support  by  software  simulation  and  one  with  hardware  support,  the 
system  with  software  floating  point  will  be  slower.  For  a  user  whose  applications  will  not  use 
floating  point,  the  differences  due  to  floating  point  speed  are  not  important  and  such  a  user 
would  prefer  that  the  speed  of  floating  point  operations  not  influence  the  evaluation  of  systems 

The  above  situation  reflects  the  fact  that  the  system  factors  are  biased  by  the  selection  of 
test  problems.  There  are  several  issues  involved  here: 

1.  Some  limitation  is  unavoidable.  If  there  was  only  one  test  problem,  critics  could  complain 
that  the  problem  does  not  represent  the  usage  of  language  features  they  anticipate 
for  their  applications.  If  there  are  multiple  test  problems,  covering  different  language 
features,  critics  could  complain  that  the  combination  of  the  test  problems  does  not 
reflect  their  anticipated  usage. 

2.  The  limitation  is  not  unreasonable.  Users  with  specific  interests  may  find  information 
about  them  —  16  or  32  bit  integer  operations,  execution  with  or  without  constraint 
checking  suppressed,  etc. 

The  ACEC  makes  heavy  use  of  variables  declared  in  library  packages  (as  contrasted  to 
variables  declared  in  local  scopes).  This  reflects  our  expectations  of  typical  usage.  The 
ACEC  also  contains  test  problems  using  locally  declared  variables  so  differences  between 
scopes  can  be  observed.  The  fraction  of  operations  on  integer,  floating-point,  and  fixed- 
point  types,  and  the  precisions  of  the  operations  are  application  sensitive.  The  computed 
system  factors  will  reflect  the  distribution  of  usages  in  the  test  suite.  Users  interested  in 
the  performance  of  specific  language  features  can  examine  the  test  problems  using  these 
features,  by  reference  to  the  VDD  appendices.  If  some  language  features  are  particularly 
slow  on  one  system,  MEDIAN  will  flag  test  problems  using  this  feature  as  outliers,  drawing 
attention  to  the  performam-p  icsite. 

3.  It  is  easy  for  ACEC  users  to  select  a  subset  of  test  problems  (perhaps  including  additional 
test  problems  they  develop  themselves)  to  reflect  their  expected  usage.  For  example,  a 
user  may  decide  to  ignore  all  the  file  I/O  test  problems  if  their  project  is  targeted  for  a 
bare  machine;  for  applications  targeted  to  a  general  purpose  multiuser  system,  it  may  be 
best  not  to  consider  the  interrupt  test  problems 
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3.4  POSSIBLE  ENHANCEMENTS 

Several  possible  enhancements  to  the  ACEC  product  have  been  mentioned  at  conferences  by 
ACEC  users.  Some  of  these  have  been  included  in  the  second  ACEC  release.  Those  that  are 
not  yet  available  are  defined  in  the  following  paragraphs. 

•  Capacity  testing 

The  ACEC  does  not  include  any  explicit  tests  designed  to  systematically  determine  ca¬ 
pacity  limitation  at  compile,  link,  or  execution  time.  Some  implicit  capacity  testing  is 
inherent  in  the  organization  of  the  test  suite  and  the  assumption  that  a  system  will  be 
able  to  compile  and  execute  the  programs  in  the  ACEC  Software  Product.  However,  it 
would  be  useful  to  incorporate  explicit  capacity  testing. 

•  Systematic  compile  speed  testing 

The  ACEC  provides  for  the  recording  and  analysis  (through  MEDIAN)  of  the  elapsed 
time  to  compile  and  link  the  performance  test  programs.  However,  these  test  programs 
are  not  designed  to  contain  selections  of  language  features  (distributions  of  features) 
expected  to  cause  variations  in  compile/link  times.  It  would  be  useful  to  incorporate  test 
programs  explicitly  designed  to  emphasize  features  expected  to  result  in  different  compile 
times 

•  Memory  size 

It  has  been  suggested  that  the  ACEC  be  enhanced  to  include  programs  to  test  execution 
time  memory  capacity  limits 

•  Additional  problems 

Although  large,  the  ACEC  is  not  exhaustive  in  its  testing.  Problems  of  interest  to  specific 
application  areas  could  be  added  to  increase  the  utility  of  the  ACEC. 

•  Reliability  of  timing  measurements 

The  ACEC  timing  loop  code  has  evolved  during  the  development  of  the  product,  and 
there  are  several  enhancements  which  could  be  explored  in  order  to  improve  its  ability 
to  efficiently  produce  reliable,  portable  timing  measurements.  Of  particular  concern  is 
the  tendencv  on  one  of  the  trial  systems  for  unreliable  codes  to  be  reported  rather  than 
time  measurements  —  it  may  be  valuable  to  experiment  with  increasing  the  maximum 
number  of  outer  timing  loop  cycles  (GLOBAL. MAX. ITERATION  COUNT  and  associated 
variables,  including  the  values  of  the  array  GLOBAL. T.VALUE). 

•  User  interface 

Several  users  have  criticized  aspects  of  the  ACEC  user  interface,  as  listed  below: 
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-  Multiple  problem  programs 

Some  of  the  ACEC  performance  test  programs  contain  multiple  problems.  When 
one  problem  in  a  program  fails  to  compile,  no  results  are  obtained  for  any  of  the 
problems  in  that  program  unless  the  ACEC  user  modified  the  source  program  to 
correct  (or  remove)  the  failing  problem.  It  has  been  suggested  that  each  test 
problem  be  presented  as  a  separate  compilation  unit. 

-  Awkward  data  interface 

The  FORMAT  -  MEDIAN  interface  in  release  one  has  been  criticized  as  awkward 
and  the  time  to  re-analyze  data  as  long.  If  MEDIAN  were  to  read  a  set  of  FORMAT 
aggregates,  analysis  of  subsets  of  data  would  be  quicker  and  simpler  to  perform. 
The  MED-DATA-CONSTRUCTOR  tool  introduced  with  the  second  release  should 
help. 

-  MATH  package 

The  first  release  of  the  ACEC  was  criticized  because  significant  user  effort  was 
required  to  adapt  the  MATH  package  to  a  new  target.  Three  steps  taken  in  the 
second  release  should  greatly  ease  this  effort. 

*  In  the  second  release,  package  MATH  is  NUMWG  compatible,  making  it  easy 
to  use  an  implementation  provided  math  library  on  targets  which  support  the 
NUMWG  recommendations. 

*  A  representation  independent  implementation  of  package  MATH  DEPENDENT 
is  provided  which  makes  porting  the  ACEC  provided  package  GEN  .MATH 
straightforward. 

*  The  User's  Guide  discussion  of  math  porting  issues  has  been  expanded  making 
it  clear  that  the  users  should  try  to  interface  to  an  implementation  provided 
math  library  where  available. 

•  Weighting  of  test  problems  for  analysis 

There  is  no  current  provision  for  weighting  test  problems  based  on  importance.  Some 
users  have  requested  an  easy  way  of  isolating  test  problems  of  interest  to  them  and  of 
analyzing  performance  data  emphasizing  specific  areas.  The  ACEC  users  have  had,  and 
continue  to  have,  the  option  of  using  MEDIAN  on  any  subset  of  problems  they  wanted 
to  consider.  This  suggestion  is  related  to  the  alternative  analysis  techniques  suggested 
for  MEDIAN  statistical  processing. 

•  MEDIAN  statistical  processing 

Some  users  have  suggested  that  the  ACEC  statistical  analysis  technique  be  modified 
to  provide  for  inference  -  confidence  intervals  and  significant  differences.  I  nere  ts  some 
statistical  research  which  is  suggestive  of  ways  to  add  confidence  estimates  while  retaining 
robust  statistical  analysis,  but  the  issue  would  require  more  study 
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•  User  Documentation 

Some  users  have  suggested  that  the  ACEC  documentation  be  expanded  to  include  all 
issues  which  might  affect  compiler  selection. 
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4  SUMMARY 


Significant  effort  was  expended  to  address  criticisms  of  the  original  ACEC  release.  All  of  the 
reported  problems  with  Release  1.0  were  either  addressed  in  this  second  release  or  identified 
as  possible  future  enhancements.  There  were  no  major  surprises  with  the  ACEC  Release  2.0 
product.  The  ACEC  found  there  were  general  improvements  in  the  Ada  compilers  from  Release 
10  to  2.0  However,  there  was  still  one  trial  system  that  was  barely  usable  due  to  system 
crashes  and  destroyed  directories 

The  new  features  of  this  release  were:  300  new  performance  tests  (1/0,  implicit  storage 
reclamation,  application  profile  tests,  etc  );  assessors  (library  management,  symbolic  debugger, 
and  diagnostic  messages),  the  MED  .DATA  CONSTRUCTOR;  and  the  Single  System  Analysis 
( SSA)  tool. 

Even  though  approximately  300  performance  tests  were  added,  the  total  lines  of  "included" 
source  remained  constant  due  to  a  change  in  the  timing  loop  code.  Also,  changes  to  the 
example  COM  files  for  the  performance  tests  simplified  porting.  For  the  new  I/O  performance 
tests,  both  the  file  and  console  I/O  tests  showed  large  differences  between  systems  -  which 
was  anticipated.  For  the  new  implicit  storage  reclamation  performance  tests,  it  was  surprising 
to  find  failing  systems. 

The  assessors  evaluate  features  that  were  believed  to  be  the  most  valuable  for  day  to  day 
usage,  but  not  all  of  these  capabilities  are  essential  to  have  a  usable  system.  The  Diagnos¬ 
tics  Assessor  proved  to  be  straightforward  to  execute  and  easy  to  adapt  User  feedback  on 
its  grading  system  will  give  an  idea  as  to  its  usefulness.  The  Library  and  the  Symbolic  De¬ 
bugger  Assessors  required  more  adaptation  effort  than  was  anticipated  Some  systems  do  not 
support  many  of  the  extensive  capabilities  explored  in  these  assessors,  but  the  assessors  were 
designed  with  future  technology  in  mind.  One  system,  the  1750A  cross  compiler,  did  not  have 
a  debugger  Most  of  the  trial  systems'  program  libraries  were  more  fragile  than  anticipated. 

The  MED  DATA  CONSTRUCTOR  program  was  an  improvement  to  the  FORMAT  - 
MEDIAN  interface.  This  program  worked  as  anticipated  by  significantly  reducing  the  manual 
effort  needed  to  format  the  data  for  the  MEDIAN  analysis. 

1  he  SSA  did  not  provide  any  new  information,  but  organized  the  available  data  and  pre¬ 
sented  it  so  that  the  user  did  not  have  to  tediously  compare  related  tests.  By  prcviding  a  tool 
which  automates  the  comparisons  of  related  tests,  the  SSA  will  give  more  insight  into  system 
perfnrmanr»»  with  less  iispt  effort  than  when  th»  mmparisons  had  to  Iip  mad°  hv  hand 

To  execute  this  ACEC  Release  2  0  test  suite  and  assessors  through  the  analysis  phase  is 
not  a  trivial  task.  A  lot  of  time,  effort  and  computer  resources  will  be  necessary  in  order 
to  extensively  test  an  Ada  compiler.  But,  when  a  user  has  completely  run  the  ACEC  test 
suite,  assessors  and  analysis  tools,  the  majority  of  a  compilation  system  s  features  will  have 
been  exercised  and  the  evaluator  will  have  a  good,  overall  knowledge  of  that  compiler  and 
associated  capabilities. 
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5  NOTES 

This  section  contains  information  only  and  is  not  contractually  binding. 


5.1  ABBREVIATIONS  AND  ACRONYMS 


ACEC 

ACVC 

BMA 

CPU 

I/O 

LRM 

NUMWG 


SSA 

TOR 


Ada  Compiler  Evaluation  Capability 
Ada  Compiler  Validation  Capability 

Boeing  Military  Airplanes 

Central  Processing  Unit 

Input  /  Output 

(Ada)  Language  Reference  Manual  (MIL-STD-1815A) 

Numerics  Working  Group  (NUMWG) 
subcommittee  of  the  Association  for 
Computing  Machinery,  Special  Interest 
Group  on  Ada 

Single  System  Analysis 

Technical  Operating  Report 


VDD 


Version  Description  Document 
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